Nonparametric Model-Based Reinforcement Learning
نویسنده
چکیده
This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectory optimizers can be used effectively with learned nonparametric models. We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model.
منابع مشابه
Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments
Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments by Miao Liu Department of Electrical and Computer Engineering Duke University
متن کاملNonparametric Bayesian Policy Priors for Reinforcement Learning
We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and s...
متن کاملNonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions
We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We pr...
متن کاملReinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملNonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains
The objective of my doctoral research is bring together two fields: partially-observable reinforcement learning (PORL) and non-parametric Bayesian statistics (NPB) to address issues of statistical modeling and decisionmaking in complex, realworld domains.
متن کامل